{
  "cells": [
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/security/alert-similarity/alert-similarity.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/security/alert-similarity/alert-similarity.ipynb)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fTL69Cds1N9P"
      },
      "source": [
        "We perform some `pip install` commands to install the Pinecone client, sentence transformers (which we use for encoding) and other required libraries."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "3A7HbpGd1N9R",
        "outputId": "49b8bb86-515c-455e-dfff-9e6fd6fff07b"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Requirement already satisfied: pinecone-client in /usr/local/lib/python3.7/dist-packages (2.0.3)\n",
            "Requirement already satisfied: sentence-transformers in /usr/local/lib/python3.7/dist-packages (2.1.0)\n",
            "Requirement already satisfied: loguru>=0.5.0 in /usr/local/lib/python3.7/dist-packages (from pinecone-client) (0.5.3)\n",
            "Requirement already satisfied: dnspython>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from pinecone-client) (2.1.0)\n",
            "Requirement already satisfied: pyyaml>=5.4 in /usr/local/lib/python3.7/dist-packages (from pinecone-client) (6.0)\n",
            "Requirement already satisfied: sentry-sdk>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from pinecone-client) (1.5.0)\n",
            "Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.7/dist-packages (from pinecone-client) (2.23.0)\n",
            "Requirement already satisfied: typing-extensions>=3.7.4 in /usr/local/lib/python3.7/dist-packages (from pinecone-client) (3.10.0.2)\n",
            "Requirement already satisfied: urllib3>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from pinecone-client) (1.24.3)\n",
            "Requirement already satisfied: python-dateutil>=2.5.3 in /usr/local/lib/python3.7/dist-packages (from pinecone-client) (2.8.2)\n",
            "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.5.3->pinecone-client) (1.15.0)\n",
            "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->pinecone-client) (2.10)\n",
            "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->pinecone-client) (3.0.4)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->pinecone-client) (2021.10.8)\n",
            "Requirement already satisfied: tokenizers>=0.10.3 in /usr/local/lib/python3.7/dist-packages (from sentence-transformers) (0.10.3)\n",
            "Requirement already satisfied: torch>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from sentence-transformers) (1.10.0+cu111)\n",
            "Requirement already satisfied: torchvision in /usr/local/lib/python3.7/dist-packages (from sentence-transformers) (0.11.1+cu111)\n",
            "Requirement already satisfied: nltk in /usr/local/lib/python3.7/dist-packages (from sentence-transformers) (3.2.5)\n",
            "Requirement already satisfied: sentencepiece in /usr/local/lib/python3.7/dist-packages (from sentence-transformers) (0.1.96)\n",
            "Requirement already satisfied: huggingface-hub in /usr/local/lib/python3.7/dist-packages (from sentence-transformers) (0.2.1)\n",
            "Requirement already satisfied: scikit-learn in /usr/local/lib/python3.7/dist-packages (from sentence-transformers) (1.0.1)\n",
            "Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from sentence-transformers) (1.4.1)\n",
            "Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from sentence-transformers) (4.62.3)\n",
            "Requirement already satisfied: transformers<5.0.0,>=4.6.0 in /usr/local/lib/python3.7/dist-packages (from sentence-transformers) (4.13.0)\n",
            "Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from sentence-transformers) (1.19.5)\n",
            "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.7/dist-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers) (21.3)\n",
            "Requirement already satisfied: sacremoses in /usr/local/lib/python3.7/dist-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers) (0.0.46)\n",
            "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.7/dist-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers) (2019.12.20)\n",
            "Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers) (3.4.0)\n",
            "Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers) (4.8.2)\n",
            "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging>=20.0->transformers<5.0.0,>=4.6.0->sentence-transformers) (3.0.6)\n",
            "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->transformers<5.0.0,>=4.6.0->sentence-transformers) (3.6.0)\n",
            "Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers<5.0.0,>=4.6.0->sentence-transformers) (7.1.2)\n",
            "Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers<5.0.0,>=4.6.0->sentence-transformers) (1.1.0)\n",
            "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->sentence-transformers) (3.0.0)\n",
            "Requirement already satisfied: pillow!=8.3.0,>=5.3.0 in /usr/local/lib/python3.7/dist-packages (from torchvision->sentence-transformers) (7.1.2)\n"
          ]
        }
      ],
      "source": [
        "!pip install -qU pinecone-client sentence-transformers"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Kjzygop31N9S"
      },
      "outputs": [],
      "source": [
        "alert_list = [\n",
        "    '2021-12-13T00:45:31+00:00 File change alert in directory /users/james/documents/projects',\n",
        "    '2021-12-11T12:01:08+00:00 File change alert in directory /users/dan/documents/projects',\n",
        "    '2021-12-10T14:31:12+00:00 New login location for /users/james in location Rome, Italy',\n",
        "    '2021-12-09T18:04:52+00:00 File change alert in directory /users/dan/documents/projects',\n",
        "    '2021-12-09T12:01:41+00:00 Directory change alert in /users/james/documents/projects'\n",
        "]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "B9xRaC_91N9S"
      },
      "source": [
        "### Feature Extraction\n",
        "\n",
        "Given our alerts, we may want to normalize and extract specific features. For full alerts this is much more complex, but in our case all we need to do is remove the timestamp and normalize the directory paths."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "nyzpoUfA1N9T"
      },
      "outputs": [],
      "source": [
        "import re\n",
        "\n",
        "# this regex matches the timestamp\n",
        "timestamp = re.compile(r\"\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\+\\d{2}:\\d{2}\")\n",
        "# this regex matches user directories within a filepath\n",
        "user_dir = re.compile(r\"(?<=\\/users\\/)\\w+\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "j6xjxaPp1N9T"
      },
      "outputs": [],
      "source": [
        "for i, alert in enumerate(alert_list):\n",
        "    alert = timestamp.sub('', alert)\n",
        "    alert = user_dir.sub('<user>', alert)\n",
        "    alert_list[i] = alert.strip()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "5a7OQsEj1N9T",
        "outputId": "327e8ef2-421e-4f94-b563-366269a73b49"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "['File change alert in directory /users/<user>/documents/projects',\n",
              " 'File change alert in directory /users/<user>/documents/projects',\n",
              " 'New login location for /users/<user> in location Rome, Italy',\n",
              " 'File change alert in directory /users/<user>/documents/projects',\n",
              " 'Directory change alert in /users/<user>/documents/projects']"
            ]
          },
          "execution_count": 5,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "alert_list"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FcNpyKf-1N9U"
      },
      "source": [
        "### Tokenization and Vectorization\n",
        "\n",
        "After extracting the relevant features only, we can go ahead and convert our text into machine-readable vectors. Expel handles this using MinHashing which suits the complex nature of security alerts. As we have only text we can encode them a little easier."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3F9zVA9C1N9U"
      },
      "outputs": [],
      "source": [
        "from sentence_transformers import SentenceTransformer\n",
        "import torch\n",
        "\n",
        "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n",
        "\n",
        "# initialize a model for creating the vectors\n",
        "model = SentenceTransformer(\n",
        "    'flax-sentence-embeddings/all_datasets_v4_MiniLM-L6',\n",
        "    device=device\n",
        ")\n",
        "# encode the vectors\n",
        "alert_vectors = model.encode(alert_list)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Gb_D2uLD1N9U",
        "outputId": "572589fa-0941-4a34-b4dd-18b82fc0ed0b"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "array([ 2.63850274e-03,  5.77200539e-02,  1.25898756e-02,  2.10825801e-02,\n",
              "        1.08278811e-01, -3.81005332e-02,  9.38568115e-02,  6.46000504e-02,\n",
              "        1.26974329e-01,  9.88046732e-03, -1.58750673e-03,  9.68097337e-03,\n",
              "        1.53104132e-02,  4.15581204e-02, -9.68658403e-02,  4.53900695e-02,\n",
              "       -9.25505087e-02,  4.98605147e-03,  2.75635161e-02,  2.44797170e-02,\n",
              "        2.19219015e-03,  2.97133494e-02,  5.95396981e-02,  4.05759411e-03,\n",
              "        5.13067469e-02, -3.31423171e-02,  3.54301147e-02, -2.88685504e-02,\n",
              "       -1.01434596e-01,  2.20345072e-02,  4.25754040e-02,  1.00562200e-02,\n",
              "        5.89723103e-02, -6.05094843e-02,  4.49954234e-02,  2.97875330e-02,\n",
              "        8.62189829e-02, -3.52832898e-02,  6.59001805e-03,  1.62643765e-03,\n",
              "       -6.18802458e-02, -2.92162057e-02, -7.86487479e-03, -1.83716938e-02,\n",
              "       -9.11098942e-02, -4.29506749e-02,  4.36825491e-02, -8.59973580e-02,\n",
              "       -2.81795841e-02,  3.23357284e-02,  7.09767593e-03, -1.03866287e-01,\n",
              "       -3.44000347e-02, -1.20279513e-01,  5.01218699e-02, -4.33811452e-03,\n",
              "        1.87478829e-02,  4.05522101e-02,  7.86126405e-03, -4.02869694e-02,\n",
              "        5.86897433e-02, -2.53101364e-02, -9.56477597e-03, -1.69195526e-03,\n",
              "       -2.85840128e-02,  3.58164907e-02, -5.39723225e-02,  5.57696112e-02,\n",
              "       -5.95100373e-02, -3.28040193e-03,  3.54249887e-02, -2.08786372e-02,\n",
              "        2.72002034e-02, -3.36746462e-02,  4.34077680e-02, -2.24685986e-02,\n",
              "        6.31191880e-02,  1.11698054e-01,  5.24062142e-02, -7.80271366e-02,\n",
              "        1.62652526e-02, -1.35545945e-02,  3.75552364e-02, -3.62785198e-02,\n",
              "        6.37204349e-02,  1.75498445e-02,  3.22374143e-02,  2.41447259e-02,\n",
              "        4.17345129e-02, -6.20786892e-03,  2.07566060e-02, -7.56666660e-02,\n",
              "        8.54531582e-03,  3.15087251e-02, -6.22902587e-02, -4.23632860e-02,\n",
              "       -3.15827951e-02,  2.62099355e-02,  1.05932184e-01,  1.58440825e-02,\n",
              "       -7.11146370e-02, -7.31627969e-03, -1.46460859e-02, -5.44231273e-02,\n",
              "        5.13802022e-02,  4.57305945e-02, -2.83478089e-02,  6.97018504e-02,\n",
              "        1.24004129e-02, -5.14467470e-02, -3.86871994e-02, -8.66028443e-02,\n",
              "       -1.05114765e-02, -6.09139018e-02, -3.05845086e-02, -3.31903026e-02,\n",
              "       -2.23008972e-02,  1.07396603e-01, -9.18497890e-02,  1.04094096e-01,\n",
              "        1.15318768e-01, -1.00396343e-01, -3.89216989e-02, -7.65511068e-03,\n",
              "       -8.58792802e-04, -1.13718109e-02,  8.22806954e-02, -2.70254967e-33,\n",
              "        7.88033232e-02,  2.78213359e-02, -5.32714948e-02,  3.64148468e-02,\n",
              "        3.04675810e-02, -1.14569485e-01,  4.04095463e-02,  6.24048151e-03,\n",
              "       -6.73794448e-02, -3.79236005e-02,  5.36862127e-02,  4.41863127e-02,\n",
              "        5.65713346e-02, -3.29960547e-02, -1.37264896e-02, -3.05318218e-02,\n",
              "       -1.23337992e-02, -1.47263026e-02,  3.59738134e-02, -1.14430405e-01,\n",
              "       -5.96444681e-02, -3.80513966e-02, -2.23829374e-02,  2.64780708e-02,\n",
              "        7.77011961e-02, -2.19631847e-02,  9.51272547e-02,  1.82623300e-03,\n",
              "        1.04603544e-02, -2.50222627e-02, -1.07257292e-02,  1.73230923e-03,\n",
              "        7.48116300e-02,  1.57120358e-02, -6.54432774e-02,  1.94188636e-02,\n",
              "       -7.47222826e-02,  6.20975951e-03,  1.88839566e-02, -4.93428037e-02,\n",
              "        3.35817151e-02, -3.69731188e-02, -1.27299177e-02,  1.20866327e-02,\n",
              "        1.85845140e-02,  1.95903368e-02,  2.70866193e-02,  1.74267218e-02,\n",
              "        3.73416282e-02, -6.72817081e-02,  1.60219334e-02,  1.85161233e-02,\n",
              "        3.31264432e-03,  9.61358100e-02,  6.91460967e-02, -8.55129398e-03,\n",
              "       -6.98883012e-02, -1.07190656e-02, -5.91058889e-03, -9.62655842e-02,\n",
              "        8.08604360e-02,  3.73142622e-02,  2.47775074e-02, -6.13030531e-02,\n",
              "       -1.90243833e-02, -1.03649795e-02,  1.88350864e-02,  1.29816420e-02,\n",
              "        8.25940520e-02,  7.43430108e-02, -6.88948780e-02, -3.13298330e-02,\n",
              "        1.75394211e-02, -6.82779681e-03, -3.67052667e-02, -1.98432282e-02,\n",
              "       -9.79202166e-02,  1.88010447e-02, -1.41411591e-02,  2.69138231e-03,\n",
              "       -5.16655706e-02, -7.37194270e-02,  2.95592193e-02, -1.25260102e-02,\n",
              "        4.00106125e-02,  5.18534072e-02, -2.35641655e-02,  6.37087524e-02,\n",
              "       -1.05442971e-01,  5.07711396e-02,  4.99037802e-02, -7.80053763e-03,\n",
              "       -4.70592827e-02, -2.02013757e-02,  2.98396144e-02, -2.51546998e-34,\n",
              "        6.33440092e-02, -6.91884756e-02, -8.87726322e-02, -2.87544187e-02,\n",
              "        1.51917562e-02,  2.44423039e-02,  3.99621837e-02,  1.99101251e-02,\n",
              "       -2.91493814e-03,  6.19866438e-02, -4.93257008e-02, -5.48074301e-03,\n",
              "       -3.69426236e-02,  1.67226084e-02, -1.17542390e-02, -4.08189185e-02,\n",
              "       -4.39212136e-02, -2.69003194e-02, -5.61109781e-02, -3.04790866e-02,\n",
              "       -6.71155676e-02,  2.17360239e-02,  5.25493249e-02, -2.43198052e-02,\n",
              "        2.30742544e-02, -2.54812818e-02, -3.96106346e-03, -2.23815367e-02,\n",
              "        3.34204873e-03, -5.64617403e-02, -1.03526570e-01,  1.06934020e-02,\n",
              "       -1.09131224e-01,  2.19890960e-02,  8.26925337e-02, -4.76278067e-02,\n",
              "       -1.03369709e-02, -5.15419468e-02, -5.49735650e-02,  1.06162205e-01,\n",
              "        2.96678767e-02,  5.61360568e-02,  7.29021728e-02,  3.70773822e-02,\n",
              "       -4.62691411e-02,  2.81236470e-02,  1.35271564e-01, -1.45460013e-02,\n",
              "       -7.39129931e-02, -2.53631268e-02,  4.20574099e-02,  3.70417885e-03,\n",
              "        1.16128579e-01, -4.13022190e-02,  3.82449441e-02,  1.03765856e-02,\n",
              "        2.73951814e-02, -1.15543649e-01, -3.82760912e-02,  8.19787234e-02,\n",
              "       -9.75924544e-03, -3.31990197e-02,  2.11858116e-02,  6.87527936e-03,\n",
              "       -8.81113783e-02,  1.95725616e-02, -2.74158623e-02,  4.87011671e-02,\n",
              "        4.65871282e-02, -2.24565901e-02,  4.76167910e-02, -3.21169347e-02,\n",
              "       -7.96410516e-02, -2.50938740e-02,  1.50829284e-02, -7.92562813e-02,\n",
              "        2.19797585e-02, -9.08251554e-02, -5.08066043e-02, -2.55357511e-02,\n",
              "        1.00404672e-01,  3.39660943e-02, -2.77379788e-02,  1.21088023e-03,\n",
              "       -5.10935672e-02,  6.70523122e-02, -3.20650041e-02,  3.75220552e-02,\n",
              "       -1.47409830e-02,  1.76750477e-02, -6.60174116e-02,  8.85456875e-02,\n",
              "        6.63779154e-02, -1.07911132e-01, -1.09216189e-02, -2.01406269e-08,\n",
              "       -7.43079633e-02,  5.23790084e-02,  2.16511488e-02,  1.21332686e-02,\n",
              "        5.19793928e-02, -3.29178534e-02, -9.21363290e-03,  3.43007781e-02,\n",
              "       -2.16829292e-02, -1.65928267e-02,  1.18175205e-02, -5.12614101e-02,\n",
              "        7.86438882e-02, -2.98034623e-02, -1.01861835e-01, -1.18020579e-01,\n",
              "        6.95748404e-02,  2.82584988e-02, -1.90740041e-02, -2.37008203e-02,\n",
              "        7.58684054e-02,  5.74381463e-02, -3.80691253e-02,  2.79441085e-02,\n",
              "        2.65919343e-02, -2.35351026e-02, -2.21430659e-02, -4.42575179e-02,\n",
              "       -6.90534562e-02, -5.22885434e-02, -6.32464420e-03,  1.86705012e-02,\n",
              "       -4.34680469e-03,  8.24591890e-02,  1.83595996e-02,  6.30020872e-02,\n",
              "        5.74136488e-02, -9.05448850e-03,  2.25272290e-02,  6.28998503e-02,\n",
              "        2.56116106e-03,  4.69376370e-02, -2.93050017e-02,  2.92057730e-02,\n",
              "       -1.33351326e-01, -4.84530702e-02,  3.48077863e-02, -7.91741163e-02,\n",
              "       -2.71324944e-02, -8.08887705e-02, -4.25336175e-02, -8.86475667e-03,\n",
              "        4.76235617e-03,  5.36946878e-02,  3.47926165e-03, -4.89818938e-02,\n",
              "        8.54574069e-02,  3.72080156e-03,  2.58305296e-03,  2.21380964e-02,\n",
              "        2.47137807e-02, -1.86638590e-02,  1.09022539e-02, -1.19240687e-03],\n",
              "      dtype=float32)"
            ]
          },
          "execution_count": 7,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "alert_vectors[0]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "BiiTF_BJ1N9V"
      },
      "source": [
        "### Search with a Vector Index\n",
        "\n",
        "We have our vector representations of the alerts. Now we use a vector index to store those vectors. [Pinecone](https://www.pinecone.io) is a managed vector index that allows us to set this up very easily.\n",
        "\n",
        "To follow along with this step you will need a [free API key](https://app.pinecone.io)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "jzrJdwds1N9V"
      },
      "outputs": [],
      "source": [
        "from pinecone import Pinecone\n",
        "\n",
        "pinecone.init(\n",
        "    api_key=\"YOUR_API_KEY\",\n",
        "    environment=\"YOUR_ENV\"  # find next to API key in console\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zJ9sctg21N9V"
      },
      "source": [
        "We need the vector dimension to create a new Pinecone index."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "IyUDXk351N9W",
        "outputId": "60ddd129-1f3e-435f-9a29-5676a997ee5c"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "384"
            ]
          },
          "execution_count": 9,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dim = alert_vectors.shape[1]\n",
        "dim"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "YFBrkyn81N9W"
      },
      "outputs": [],
      "source": [
        "index_name = 'alert-similarity'\n",
        "\n",
        "# we create an index to store the vectors and search through\n",
        "pinecone.create_index(index_name, dimension=dim)\n",
        "# then initialize connection to the index\n",
        "index = pinecone.Index(index_name)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "RsuRV7Qv1N9W",
        "outputId": "e6ef6d0d-322b-453a-a223-499d88df9a21"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'upserted_count': 5}"
            ]
          },
          "execution_count": 11,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "# organize the data\n",
        "data = []\n",
        "for i, alert_vec in enumerate(alert_vectors):\n",
        "    data.append((f'id-{i}', alert_vec.tolist()))\n",
        "# upsert the data\n",
        "index.upsert(vectors=data)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FfhnfKeg1N9X"
      },
      "source": [
        "Now we can identify that the four document/directory change alerts are all very similar and differ from the one user location alert."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "S2RVrSv11N9X",
        "outputId": "8d809cfa-6cec-4db7-e97d-56ec08d0b8b7"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'results': [{'matches': [{'id': 'id-1', 'score': 0.601801634, 'values': []},\n",
              "                          {'id': 'id-3', 'score': 0.601801634, 'values': []},\n",
              "                          {'id': 'id-0', 'score': 0.601801634, 'values': []},\n",
              "                          {'id': 'id-4', 'score': 0.564733624, 'values': []},\n",
              "                          {'id': 'id-2', 'score': 0.270546645, 'values': []}],\n",
              "              'namespace': ''}]}"
            ]
          },
          "execution_count": 12,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "query = \"Some change alert in any location\"\n",
        "xq = model.encode([query]).tolist()\n",
        "# and make the query in Pinecone\n",
        "result = index.query(vector=xq, top_k=5)\n",
        "result"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DfNmj7O61N9X"
      },
      "source": [
        "We can map these IDs back to our original sentences."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "kw4_nHVM1N9X",
        "outputId": "2bb34782-3039-465c-82c3-b1a92a8d5e4b"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "0.602\n",
            "File change alert in directory /users/<user>/documents/projects\n",
            "0.602\n",
            "File change alert in directory /users/<user>/documents/projects\n",
            "0.602\n",
            "File change alert in directory /users/<user>/documents/projects\n",
            "0.565\n",
            "Directory change alert in /users/<user>/documents/projects\n",
            "0.271\n",
            "New login location for /users/<user> in location Rome, Italy\n"
          ]
        }
      ],
      "source": [
        "for record in result['matches']:\n",
        "    idx = int(record['id'][-1])\n",
        "    score = round(record['score'], 3)\n",
        "    print(f\"{score}\\n{alert_list[idx]}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Y4ols9z11N9Y"
      },
      "source": [
        "And we can see that both file and directory change alerts score much higher with the query `\"Some change alert in any location\"` than the login location alert."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Mbt5Ec7tKoL0"
      },
      "source": [
        "# Delete the index"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9y6iIor1KoL0"
      },
      "source": [
        "Delete the index once you do not want to use it anymore. Once the index is deleted, you cannot use it again."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "vAMnQjvsxSw0"
      },
      "outputs": [],
      "source": [
        "pinecone.delete_index(index_name)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "---"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "example.ipynb",
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.7 (main, Sep 14 2022, 22:38:23) [Clang 14.0.0 (clang-1400.0.29.102)]"
    },
    "orig_nbformat": 4,
    "vscode": {
      "interpreter": {
        "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
